4 research outputs found

    Repertoire-Specific Vocal Pitch Data Generation for Improved Melodic Analysis of Carnatic Music

    Get PDF
    Deep Learning methods achieve state-of-the-art in many tasks, including vocal pitch extraction. However, these methods rely on the availability of pitch track annotations without errors, which are scarce and expensive to obtain for Carnatic Music. Here we identify the tradition-related challenges and propose tailored solutions to generate a novel, large, and open dataset, the Saraga-Carnatic-Melody-Synth (SCMS), comprising audio mixtures and time-aligned vocal pitch annotations. Through a cross-cultural evaluation leveraging this novel dataset, we show improvements in the performance of Deep Learning vocal pitch extraction methods on Indian Art Music recordings. Additional experiments show that the trained models outperform the currently used heuristic-based pitch extraction solutions for the computational melodic analysis of Carnatic Music and that this improvement leads to better results in the musicologically relevant task of repeated melodic pattern discovery when evaluated using expert annotations. The code and annotations are made available for reproducibility. The novel dataset and trained models are also integrated into the Python package compIAM1 which allows them to be used out-of-the-box

    A diffusion-inspired training strategy for singing voice extraction in the waveform domain

    No full text
    This work has been accepted at the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022), at Bengaluru, India. December 4-8, 2022.Notable progress in music source separation has been achieved using multi-branch networks that operate on both temporal and spectral domains. However, such networks tend to be complex and heavy-weighted. In this work, we tackle the task of singing voice extraction from polyphonic music signals in an end-to-end manner using an approach inspired by the training procedure of denoising diffusion models. We perform unconditional signal modelling to gradually convert an input mixture signal to the corresponding singing voice or accompaniment. We use fewer parameters than the state-of-the-art models while operating on the waveform domain, bypassing phase-related problems. More concisely, we train a non-causal WaveNet using a diffusion-inspired strategy improving the said network for singing voice extraction and obtaining performance comparable to the end-to-end state-of-the-art on MUSDB18. We further report results on a non-MUSDB-overlapping version of MedleyDB and the multi-track audio of the Saraga Carnatic dataset showing good generalization, and run perceptual tests of our approach. Code, models, and audio examples are made available.This work was carried out under the projects Musical AI - PID2019-111403GB-I00/AEI/10.13039/501100011033 and NextCore - RTC2019-007248-7 funded by the Spanish Ministerio de Ciencia, Innovación y Universidades (MCIU) and the Agencia Estatal de Investigación (AEI)

    Carnatic singing voice separation using cold diffusion on training data with bleeding

    No full text
    This work has been accepted at the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), at Milan, Italy. October 5-9, 2023.Supervised music source separation systems using deep learning are trained by minimizing a loss function between pairs of predicted separations and ground-truth isolated sources. However, open datasets comprising isolated sources are few, small, and restricted to a few music styles. At the same time, multi-track datasets with source bleeding are usually found larger in size, and are easier to compile. In this work, we address the task of singing voice separation when the ground-truth signals have bleeding and only the target vocals and the corresponding mixture are available. We train a cold diffusion model on the frequency domain to iteratively transform a mixture into the corresponding vocals with bleeding. Next, we build the final separation masks by clustering spectrogram bins according to their evolution along the transformation steps. We test our approach on a Carnatic music scenario for which solely datasets with bleeding exist, while current research on this repertoire commonly uses source separation models trained solely with Western commercial music. Our evaluation on a Carnatic test set shows that our system improves Spleeter on interference removal and it is competitive in terms of signal distortion. Code is open sourced.This work was carried out under the projects Musical AI - PID2019-111403GB-I00/AEI/10.13039/501100011033 funded by the Spanish Ministerio de Ciencia, Innovación y Universidades (MCIU) and the Agencia Estatal de Investigación (AEI)

    In search of Sañcāras: tradition-informed repeated melodic pattern recognition in carnatic music

    No full text
    This work has been accepted at the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022), at Bengaluru, India. December 4-8, 2022.Carnatic Music is a South Indian art and devotional musical practice in which melodic patterns (motifs and phrases), known as sañcāras, play a crucial structural and expressive role. We demonstrate how the combination of transposition invariant features learnt by a Complex Autoencoder (CAE) and predominant pitch tracks extracted using a Frequency-Temporal Attention Network (FTANet) can be used to annotate and group regions of variable-length, repeated, melodic patterns in audio recordings of multiple Carnatic Music performances. These models are trained on novel, expert-curated datasets of hundreds of Carnatic audio recordings and the extraction process tailored to account for the unique characteristics of sañcāras in Carnatic Music. Experimental results show that the proposed method is able to identify 54% of all sañcāras annotated by a professional Carnatic vocalist. Code to reproduce and interact with these results is available online.This research work was carried out under the project Musical AI - PID2019- 111403GBI00/ AEI/10.13039/501100011033, funded by the Spanish Ministerio de Ciencia e Innovación and the Agencia Estatal de Investigación
    corecore